Code to clean the data file-by-file

Importing the necessary libraries

In [1]:
import pandas as pd
import csv
import string
import re
import nltk

nltk.download('stopwords')
nltk.download('names')
from nltk.corpus import stopwords
from nltk.corpus import names
from nltk import word_tokenize
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Aruna\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package names to
[nltk_data]     C:\Users\Aruna\AppData\Roaming\nltk_data...
[nltk_data]   Package names is already up-to-date!
In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

%matplotlib inline
pd.set_option('display.max_colwidth', 150)

(A) Read the CSV File

In [3]:
df = pd.read_csv("C:\\Users\\Aruna\\Documents\\input\\Amazon SNS.csv")

df['description'] = df['description'].apply(lambda x: " ".join(x for x in str(x).split())) # converting to string
 
df.head(10)
Out[3]:
id label description
0 1737 Amazon SNS SNS - SMS Failure delivery and retrieval of detailed SMS log I am using SNS service for SMS notification. 1) In the event of a failed SMS, how to ...
1 1736 Amazon SNS Messages Coming from Multiple AWS Accounts to Single SMS Short Code We use Simple Notification Service to send transactional SMS messages to our c...
2 1735 Amazon SNS Android FCM support in SNS Unity plugin? The Unity SNS plug-in doesn't build in Android, due to its use of the depricated GCM API in the AWSUnityG...
3 1734 Amazon SNS SNS : OTP message delivery issue We are facing a issue with OTP message delivery in our production environment. There is mix behavior 1. The messa...
4 1733 Amazon SNS Simulating message persistence with SNS using SQS We are evaluating SNS for our messaging requirements to integrate multiple applications. we have...
5 1732 Amazon SNS Most SMS messages are not delivered I am trying to send SMS messages via SNS and quite a few messages sent to the same number are not delivered. I...
6 1731 Amazon SNS SMS delivery fails with "Phone is currently unreachable/unavailable" Hi all, Started observing this behaviour today (it used to work until last we...
7 1730 Amazon SNS Budget Alert SNS message sample Hi, I'm implementing a lambda that will receive a SNS message from a Budget Alert created on billing panel but I c...
8 1730 Amazon SNS Hello on the Lambda page, when you configure test event, there is the option to select an event teplate. Look for AWS / Amazon SNS Topic Notificat...
9 1729 Amazon SNS SNS Mobile Push: Endpoints use "default" content instead of "APNS" Recently diving in to the SNS Mobile Push API. I have registered many iOS clien...
In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4601 entries, 0 to 4600
Data columns (total 3 columns):
id             4601 non-null int64
label          4601 non-null object
description    4601 non-null object
dtypes: int64(1), object(2)
memory usage: 107.9+ KB

Check out one sample post:

In [5]:
p = 2000

df['description'][p]
Out[5]:
"...ADMMessageHandler: java.lang.RuntimeException: Stub! Hello all, I'm trying to run KindleMobilePushApp example in a Kindle Fire with software version 8.5.1_user_5159720 but I get the error below. What am I doing wrong?. Thank you. Ivan Martinez 09-28 14:06:30.594 23659-23659/? W/dalvikvm﹕ threadid=1: thread exiting with uncaught exception (group=0x40b05228) 09-28 14:06:30.594 23659-23659/? E/AndroidRuntime﹕ FATAL EXCEPTION: main java.lang.RuntimeException: Unable to instantiate service com.amazonaws.kindletest.ADMMessageHandler: java.lang.RuntimeException: Stub! at android.app.ActivityThread.handleCreateService(ActivityThread.java:2315) at android.app.ActivityThread.access$1600(ActivityThread.java:127) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1231) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:137) at android.app.ActivityThread.main(ActivityThread.java:4533) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:511) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:551) at dalvik.system.NativeStart.main(Native Method) Caused by: java.lang.RuntimeException: Stub! at com.amazon.device.messaging.ADMMessageHandlerBase.<init>(Unknown Source) at com.amazonaws.kindletest.ADMMessageHandler.<init>(ADMMessageHandler.java:32) at java.lang.Class.newInstanceImpl(Native Method) at java.lang.Class.newInstance(Class.java:1319) at android.app.ActivityThread.handleCreateService(ActivityThread.java:2312) at android.app.ActivityThread.access$1600(ActivityThread.java:127) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1231) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:137) at android.app.ActivityThread.main(ActivityThread.java:4533) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:511) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:784) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:551) at dalvik.system.NativeStart.main(Native Method) 09-28 14:11:30.641 23659-23659/? E/AndroidRuntime﹕ Handle UnCaght exceptions. KILLING PID: 23659"

Top 30 words + frequency of each:

In [6]:
pd.Series(' '.join(df['description']).split()).value_counts()[:30]
Out[6]:
the        14498
to         12038
I           6847
a           6257
and         4967
is          4822
for         4007
SNS         3828
in          3592
that        3214
of          3131
you         2882
this        2753
on          2586
it          2439
have        2385
with        2278
not         2222
be          2019
an          1892
can         1784
are         1727
from        1658
message     1545
my          1540
but         1519
as          1393
=           1319
using       1316
we          1271
dtype: int64
In [7]:
print("There are totally", df['description'].apply(lambda x: len(x.split(' '))).sum(), "words before cleaning.")
There are totally 345285 words before cleaning.

(B) Text Pre-processing

In [8]:
STOPWORDS = stopwords.words('english')
my_stop_words = ["hi", "hello", "regards", "thank", "thanks", "regard", "best", "wishes", "hey", "amazon", "aws", "s3",
"elastic", "beanstalk", "rds", "ec2", "lambda", "cloudfront", "cloud", "front", "vpc", "sns", "me",
"january", "february", "march", "april", "may", "june", "july", "august", "september", "october", 
"november", "december", "jan", "feb", "mar", "apr", "jun", "jul", "aug", "sep", "sept", "oct", "nov",
"dec", "monday", "tuesday", "wednesday", "thursday", "friday", "saturday", "sunday", "mon", "tue",
"wed", "thu", "fri", "sat", "sun", "ain't", "aren't", "can't", "can't've", "'cause", "could've", "couldn't",
"couldn't've", "didn't", "doesn't", "don't", "hadn't", "hadn't've", "hasn't", "haven't", "he'd", "he'd've",
"he'll", "he'll've", "he's", "how'd", "how'd'y", "how'll", "how's", "i'd", "i'd've", "i'll", "i'll've", "i'm",
"i've", "isn't", "it'd", "it'd've", "it'll", "it'll've", "it's", "let's", "mayn't", "might've", "mightn't",
"mightn't've", "must've", "mustn't", "mustn't've", "needn't", "needn't've", "oughtn't", "oughtn't've", "shan't",
"sha'n't", "shan't've", "she'd", "she'd've", "she'll", "she'll've", "she's", "should've", "shouldn't", "shouldn't've",
"so've", "so's", "that'd", "that'd've", "that's", "there'd", "there'd've", "there's", "they'd", "they'd've", "they'll",
"they'll've", "they're", "they've", "to've", "wasn't", "we'd", "we'd've", "we'll", "we'll've", "we're", "we've",
"weren't", "what'll", "what'll've", "what're", "what's", "what've", "when's", "when've", "where'd", "where's",
"where've", "who'll", "who'll've", "who's", "who've", "why's", "why've", "will've", "won't", "won't've", "would've",
"wouldn't", "wouldn't've", "yall", "yalld", "yalldve", "yallre", "yallve", "youd", "youdve", "youll",
"youllve", "youre", "youve", "do", "did", "does", "had", "have", "has", "could", "can", "as", "is",
"shall", "should", "would", "will", "you", "me", "please", "know", "who", "we", "was", "were", "edited", "by", "pm"]

name = names.words()
STOPWORDS.extend(my_stop_words)
STOPWORDS.extend(name)

REPLACE_BY_SPACE_RE = re.compile('[/(){}\[\]\|@,:;#+?]')
BAD_SYMBOLS_RE = re.compile('[^0-9a-z - _.]+')
REMOVE_HTML_RE = re.compile(r'<.*?>')
REMOVE_HTTP_RE = re.compile(r'http\S+')

STOPWORDS = [BAD_SYMBOLS_RE.sub('', x) for x in STOPWORDS]

Convert to lowercase

In [9]:
df['description'] = df['description'].apply(lambda x: " ".join(x.lower() for x in str(x).split(" ")))

df['description'][p]
Out[9]:
"...admmessagehandler: java.lang.runtimeexception: stub! hello all, i'm trying to run kindlemobilepushapp example in a kindle fire with software version 8.5.1_user_5159720 but i get the error below. what am i doing wrong?. thank you. ivan martinez 09-28 14:06:30.594 23659-23659/? w/dalvikvm﹕ threadid=1: thread exiting with uncaught exception (group=0x40b05228) 09-28 14:06:30.594 23659-23659/? e/androidruntime﹕ fatal exception: main java.lang.runtimeexception: unable to instantiate service com.amazonaws.kindletest.admmessagehandler: java.lang.runtimeexception: stub! at android.app.activitythread.handlecreateservice(activitythread.java:2315) at android.app.activitythread.access$1600(activitythread.java:127) at android.app.activitythread$h.handlemessage(activitythread.java:1231) at android.os.handler.dispatchmessage(handler.java:99) at android.os.looper.loop(looper.java:137) at android.app.activitythread.main(activitythread.java:4533) at java.lang.reflect.method.invokenative(native method) at java.lang.reflect.method.invoke(method.java:511) at com.android.internal.os.zygoteinit$methodandargscaller.run(zygoteinit.java:784) at com.android.internal.os.zygoteinit.main(zygoteinit.java:551) at dalvik.system.nativestart.main(native method) caused by: java.lang.runtimeexception: stub! at com.amazon.device.messaging.admmessagehandlerbase.<init>(unknown source) at com.amazonaws.kindletest.admmessagehandler.<init>(admmessagehandler.java:32) at java.lang.class.newinstanceimpl(native method) at java.lang.class.newinstance(class.java:1319) at android.app.activitythread.handlecreateservice(activitythread.java:2312) at android.app.activitythread.access$1600(activitythread.java:127) at android.app.activitythread$h.handlemessage(activitythread.java:1231) at android.os.handler.dispatchmessage(handler.java:99) at android.os.looper.loop(looper.java:137) at android.app.activitythread.main(activitythread.java:4533) at java.lang.reflect.method.invokenative(native method) at java.lang.reflect.method.invoke(method.java:511) at com.android.internal.os.zygoteinit$methodandargscaller.run(zygoteinit.java:784) at com.android.internal.os.zygoteinit.main(zygoteinit.java:551) at dalvik.system.nativestart.main(native method) 09-28 14:11:30.641 23659-23659/? e/androidruntime﹕ handle uncaght exceptions. killing pid: 23659"

Remove all HTML tags

In [10]:
df['description'] = df['description'].apply(lambda x: " ".join(REMOVE_HTML_RE.sub(' ', x) for x in str(x).split()))

df['description'][p]
Out[10]:
"...admmessagehandler: java.lang.runtimeexception: stub! hello all, i'm trying to run kindlemobilepushapp example in a kindle fire with software version 8.5.1_user_5159720 but i get the error below. what am i doing wrong?. thank you. ivan martinez 09-28 14:06:30.594 23659-23659/? w/dalvikvm﹕ threadid=1: thread exiting with uncaught exception (group=0x40b05228) 09-28 14:06:30.594 23659-23659/? e/androidruntime﹕ fatal exception: main java.lang.runtimeexception: unable to instantiate service com.amazonaws.kindletest.admmessagehandler: java.lang.runtimeexception: stub! at android.app.activitythread.handlecreateservice(activitythread.java:2315) at android.app.activitythread.access$1600(activitythread.java:127) at android.app.activitythread$h.handlemessage(activitythread.java:1231) at android.os.handler.dispatchmessage(handler.java:99) at android.os.looper.loop(looper.java:137) at android.app.activitythread.main(activitythread.java:4533) at java.lang.reflect.method.invokenative(native method) at java.lang.reflect.method.invoke(method.java:511) at com.android.internal.os.zygoteinit$methodandargscaller.run(zygoteinit.java:784) at com.android.internal.os.zygoteinit.main(zygoteinit.java:551) at dalvik.system.nativestart.main(native method) caused by: java.lang.runtimeexception: stub! at com.amazon.device.messaging.admmessagehandlerbase. (unknown source) at com.amazonaws.kindletest.admmessagehandler. (admmessagehandler.java:32) at java.lang.class.newinstanceimpl(native method) at java.lang.class.newinstance(class.java:1319) at android.app.activitythread.handlecreateservice(activitythread.java:2312) at android.app.activitythread.access$1600(activitythread.java:127) at android.app.activitythread$h.handlemessage(activitythread.java:1231) at android.os.handler.dispatchmessage(handler.java:99) at android.os.looper.loop(looper.java:137) at android.app.activitythread.main(activitythread.java:4533) at java.lang.reflect.method.invokenative(native method) at java.lang.reflect.method.invoke(method.java:511) at com.android.internal.os.zygoteinit$methodandargscaller.run(zygoteinit.java:784) at com.android.internal.os.zygoteinit.main(zygoteinit.java:551) at dalvik.system.nativestart.main(native method) 09-28 14:11:30.641 23659-23659/? e/androidruntime﹕ handle uncaght exceptions. killing pid: 23659"
In [11]:
df['description'] = df['description'].apply(lambda x: " ".join(REMOVE_HTTP_RE.sub(' ', x) for x in str(x).split()))

df['description'][p]
Out[11]:
"...admmessagehandler: java.lang.runtimeexception: stub! hello all, i'm trying to run kindlemobilepushapp example in a kindle fire with software version 8.5.1_user_5159720 but i get the error below. what am i doing wrong?. thank you. ivan martinez 09-28 14:06:30.594 23659-23659/? w/dalvikvm﹕ threadid=1: thread exiting with uncaught exception (group=0x40b05228) 09-28 14:06:30.594 23659-23659/? e/androidruntime﹕ fatal exception: main java.lang.runtimeexception: unable to instantiate service com.amazonaws.kindletest.admmessagehandler: java.lang.runtimeexception: stub! at android.app.activitythread.handlecreateservice(activitythread.java:2315) at android.app.activitythread.access$1600(activitythread.java:127) at android.app.activitythread$h.handlemessage(activitythread.java:1231) at android.os.handler.dispatchmessage(handler.java:99) at android.os.looper.loop(looper.java:137) at android.app.activitythread.main(activitythread.java:4533) at java.lang.reflect.method.invokenative(native method) at java.lang.reflect.method.invoke(method.java:511) at com.android.internal.os.zygoteinit$methodandargscaller.run(zygoteinit.java:784) at com.android.internal.os.zygoteinit.main(zygoteinit.java:551) at dalvik.system.nativestart.main(native method) caused by: java.lang.runtimeexception: stub! at com.amazon.device.messaging.admmessagehandlerbase. (unknown source) at com.amazonaws.kindletest.admmessagehandler. (admmessagehandler.java:32) at java.lang.class.newinstanceimpl(native method) at java.lang.class.newinstance(class.java:1319) at android.app.activitythread.handlecreateservice(activitythread.java:2312) at android.app.activitythread.access$1600(activitythread.java:127) at android.app.activitythread$h.handlemessage(activitythread.java:1231) at android.os.handler.dispatchmessage(handler.java:99) at android.os.looper.loop(looper.java:137) at android.app.activitythread.main(activitythread.java:4533) at java.lang.reflect.method.invokenative(native method) at java.lang.reflect.method.invoke(method.java:511) at com.android.internal.os.zygoteinit$methodandargscaller.run(zygoteinit.java:784) at com.android.internal.os.zygoteinit.main(zygoteinit.java:551) at dalvik.system.nativestart.main(native method) 09-28 14:11:30.641 23659-23659/? e/androidruntime﹕ handle uncaght exceptions. killing pid: 23659"

Replace certain characters by space (quotation marks, parantheses etc)

In [12]:
df['description'] = df['description'].apply(lambda x: " ".join(REPLACE_BY_SPACE_RE.sub(' ', x) for x in str(x).split()))

df['description'][p]
Out[12]:
"...admmessagehandler  java.lang.runtimeexception  stub! hello all  i'm trying to run kindlemobilepushapp example in a kindle fire with software version 8.5.1_user_5159720 but i get the error below. what am i doing wrong . thank you. ivan martinez 09-28 14 06 30.594 23659-23659   w dalvikvm﹕ threadid=1  thread exiting with uncaught exception  group=0x40b05228  09-28 14 06 30.594 23659-23659   e androidruntime﹕ fatal exception  main java.lang.runtimeexception  unable to instantiate service com.amazonaws.kindletest.admmessagehandler  java.lang.runtimeexception  stub! at android.app.activitythread.handlecreateservice activitythread.java 2315  at android.app.activitythread.access$1600 activitythread.java 127  at android.app.activitythread$h.handlemessage activitythread.java 1231  at android.os.handler.dispatchmessage handler.java 99  at android.os.looper.loop looper.java 137  at android.app.activitythread.main activitythread.java 4533  at java.lang.reflect.method.invokenative native method  at java.lang.reflect.method.invoke method.java 511  at com.android.internal.os.zygoteinit$methodandargscaller.run zygoteinit.java 784  at com.android.internal.os.zygoteinit.main zygoteinit.java 551  at dalvik.system.nativestart.main native method  caused by  java.lang.runtimeexception  stub! at com.amazon.device.messaging.admmessagehandlerbase.  unknown source  at com.amazonaws.kindletest.admmessagehandler.  admmessagehandler.java 32  at java.lang.class.newinstanceimpl native method  at java.lang.class.newinstance class.java 1319  at android.app.activitythread.handlecreateservice activitythread.java 2312  at android.app.activitythread.access$1600 activitythread.java 127  at android.app.activitythread$h.handlemessage activitythread.java 1231  at android.os.handler.dispatchmessage handler.java 99  at android.os.looper.loop looper.java 137  at android.app.activitythread.main activitythread.java 4533  at java.lang.reflect.method.invokenative native method  at java.lang.reflect.method.invoke method.java 511  at com.android.internal.os.zygoteinit$methodandargscaller.run zygoteinit.java 784  at com.android.internal.os.zygoteinit.main zygoteinit.java 551  at dalvik.system.nativestart.main native method  09-28 14 11 30.641 23659-23659   e androidruntime﹕ handle uncaght exceptions. killing pid  23659"

Remove any unwanted symbols (like $, @ etc)

In [13]:
df['description'] = df['description'].apply(lambda x: " ".join(BAD_SYMBOLS_RE.sub('', x) for x in str(x).split()))

df['description'][p]
Out[13]:
'...admmessagehandler java.lang.runtimeexception stub hello all im trying to run kindlemobilepushapp example in a kindle fire with software version 8.5.1_user_5159720 but i get the error below. what am i doing wrong . thank you. ivan martinez 0928 14 06 30.594 2365923659 w dalvikvm threadid1 thread exiting with uncaught exception group0x40b05228 0928 14 06 30.594 2365923659 e androidruntime fatal exception main java.lang.runtimeexception unable to instantiate service com.amazonaws.kindletest.admmessagehandler java.lang.runtimeexception stub at android.app.activitythread.handlecreateservice activitythread.java 2315 at android.app.activitythread.access1600 activitythread.java 127 at android.app.activitythreadh.handlemessage activitythread.java 1231 at android.os.handler.dispatchmessage handler.java 99 at android.os.looper.loop looper.java 137 at android.app.activitythread.main activitythread.java 4533 at java.lang.reflect.method.invokenative native method at java.lang.reflect.method.invoke method.java 511 at com.android.internal.os.zygoteinitmethodandargscaller.run zygoteinit.java 784 at com.android.internal.os.zygoteinit.main zygoteinit.java 551 at dalvik.system.nativestart.main native method caused by java.lang.runtimeexception stub at com.amazon.device.messaging.admmessagehandlerbase. unknown source at com.amazonaws.kindletest.admmessagehandler. admmessagehandler.java 32 at java.lang.class.newinstanceimpl native method at java.lang.class.newinstance class.java 1319 at android.app.activitythread.handlecreateservice activitythread.java 2312 at android.app.activitythread.access1600 activitythread.java 127 at android.app.activitythreadh.handlemessage activitythread.java 1231 at android.os.handler.dispatchmessage handler.java 99 at android.os.looper.loop looper.java 137 at android.app.activitythread.main activitythread.java 4533 at java.lang.reflect.method.invokenative native method at java.lang.reflect.method.invoke method.java 511 at com.android.internal.os.zygoteinitmethodandargscaller.run zygoteinit.java 784 at com.android.internal.os.zygoteinit.main zygoteinit.java 551 at dalvik.system.nativestart.main native method 0928 14 11 30.641 2365923659 e androidruntime handle uncaght exceptions. killing pid 23659'

Remove trailing punctuation marks and any symbol patterns

In [14]:
df['description'] = df['description'].apply(lambda x: " ".join(x.strip('.') for x in x.split()))
df['description'] = df['description'].apply(lambda x: " ".join(x.strip('-') for x in x.split()))
df['description'] = df['description'].apply(lambda x: " ".join(x.strip('_') for x in x.split()))
df['description'][p]
Out[14]:
'admmessagehandler java.lang.runtimeexception stub hello all im trying to run kindlemobilepushapp example in a kindle fire with software version 8.5.1_user_5159720 but i get the error below what am i doing wrong thank you ivan martinez 0928 14 06 30.594 2365923659 w dalvikvm threadid1 thread exiting with uncaught exception group0x40b05228 0928 14 06 30.594 2365923659 e androidruntime fatal exception main java.lang.runtimeexception unable to instantiate service com.amazonaws.kindletest.admmessagehandler java.lang.runtimeexception stub at android.app.activitythread.handlecreateservice activitythread.java 2315 at android.app.activitythread.access1600 activitythread.java 127 at android.app.activitythreadh.handlemessage activitythread.java 1231 at android.os.handler.dispatchmessage handler.java 99 at android.os.looper.loop looper.java 137 at android.app.activitythread.main activitythread.java 4533 at java.lang.reflect.method.invokenative native method at java.lang.reflect.method.invoke method.java 511 at com.android.internal.os.zygoteinitmethodandargscaller.run zygoteinit.java 784 at com.android.internal.os.zygoteinit.main zygoteinit.java 551 at dalvik.system.nativestart.main native method caused by java.lang.runtimeexception stub at com.amazon.device.messaging.admmessagehandlerbase unknown source at com.amazonaws.kindletest.admmessagehandler admmessagehandler.java 32 at java.lang.class.newinstanceimpl native method at java.lang.class.newinstance class.java 1319 at android.app.activitythread.handlecreateservice activitythread.java 2312 at android.app.activitythread.access1600 activitythread.java 127 at android.app.activitythreadh.handlemessage activitythread.java 1231 at android.os.handler.dispatchmessage handler.java 99 at android.os.looper.loop looper.java 137 at android.app.activitythread.main activitythread.java 4533 at java.lang.reflect.method.invokenative native method at java.lang.reflect.method.invoke method.java 511 at com.android.internal.os.zygoteinitmethodandargscaller.run zygoteinit.java 784 at com.android.internal.os.zygoteinit.main zygoteinit.java 551 at dalvik.system.nativestart.main native method 0928 14 11 30.641 2365923659 e androidruntime handle uncaght exceptions killing pid 23659'

Remove any numbers

In [15]:
df['description'] = df['description'].apply(lambda x: " ".join(x for x in x.split() if not x.isdigit()))

df['description'][p]
Out[15]:
'admmessagehandler java.lang.runtimeexception stub hello all im trying to run kindlemobilepushapp example in a kindle fire with software version 8.5.1_user_5159720 but i get the error below what am i doing wrong thank you ivan martinez 30.594 w dalvikvm threadid1 thread exiting with uncaught exception group0x40b05228 30.594 e androidruntime fatal exception main java.lang.runtimeexception unable to instantiate service com.amazonaws.kindletest.admmessagehandler java.lang.runtimeexception stub at android.app.activitythread.handlecreateservice activitythread.java at android.app.activitythread.access1600 activitythread.java at android.app.activitythreadh.handlemessage activitythread.java at android.os.handler.dispatchmessage handler.java at android.os.looper.loop looper.java at android.app.activitythread.main activitythread.java at java.lang.reflect.method.invokenative native method at java.lang.reflect.method.invoke method.java at com.android.internal.os.zygoteinitmethodandargscaller.run zygoteinit.java at com.android.internal.os.zygoteinit.main zygoteinit.java at dalvik.system.nativestart.main native method caused by java.lang.runtimeexception stub at com.amazon.device.messaging.admmessagehandlerbase unknown source at com.amazonaws.kindletest.admmessagehandler admmessagehandler.java at java.lang.class.newinstanceimpl native method at java.lang.class.newinstance class.java at android.app.activitythread.handlecreateservice activitythread.java at android.app.activitythread.access1600 activitythread.java at android.app.activitythreadh.handlemessage activitythread.java at android.os.handler.dispatchmessage handler.java at android.os.looper.loop looper.java at android.app.activitythread.main activitythread.java at java.lang.reflect.method.invokenative native method at java.lang.reflect.method.invoke method.java at com.android.internal.os.zygoteinitmethodandargscaller.run zygoteinit.java at com.android.internal.os.zygoteinit.main zygoteinit.java at dalvik.system.nativestart.main native method 30.641 e androidruntime handle uncaght exceptions killing pid'

Remove the stop words

In [16]:
df['description'] = df['description'].apply(lambda x: " ".join(x for x in x.split() if x not in STOPWORDS
                                                               and len(x) > 1))

df['description'][p]
Out[16]:
'admmessagehandler java.lang.runtimeexception stub trying run kindlemobilepushapp example kindle fire software version 8.5.1_user_5159720 get error wrong ivan martinez 30.594 dalvikvm threadid1 thread exiting uncaught exception group0x40b05228 30.594 androidruntime fatal exception main java.lang.runtimeexception unable instantiate service com.amazonaws.kindletest.admmessagehandler java.lang.runtimeexception stub android.app.activitythread.handlecreateservice activitythread.java android.app.activitythread.access1600 activitythread.java android.app.activitythreadh.handlemessage activitythread.java android.os.handler.dispatchmessage handler.java android.os.looper.loop looper.java android.app.activitythread.main activitythread.java java.lang.reflect.method.invokenative native method java.lang.reflect.method.invoke method.java com.android.internal.os.zygoteinitmethodandargscaller.run zygoteinit.java com.android.internal.os.zygoteinit.main zygoteinit.java dalvik.system.nativestart.main native method caused java.lang.runtimeexception stub com.amazon.device.messaging.admmessagehandlerbase unknown source com.amazonaws.kindletest.admmessagehandler admmessagehandler.java java.lang.class.newinstanceimpl native method java.lang.class.newinstance class.java android.app.activitythread.handlecreateservice activitythread.java android.app.activitythread.access1600 activitythread.java android.app.activitythreadh.handlemessage activitythread.java android.os.handler.dispatchmessage handler.java android.os.looper.loop looper.java android.app.activitythread.main activitythread.java java.lang.reflect.method.invokenative native method java.lang.reflect.method.invoke method.java com.android.internal.os.zygoteinitmethodandargscaller.run zygoteinit.java com.android.internal.os.zygoteinit.main zygoteinit.java dalvik.system.nativestart.main native method 30.641 androidruntime handle uncaght exceptions killing pid'

Results after cleaning data:

In [17]:
df.head()
Out[17]:
id label description
0 1737 Amazon SNS sms failure delivery retrieval detailed sms log using service sms notification event failed sms get acknowledgment containing detailed log object ...
1 1736 Amazon SNS messages coming multiple accounts single sms short code use simple notification service send transactional sms messages clients requested notifica...
2 1735 Amazon SNS android fcm support unity plugin unity plugin build android due use depricated gcm api awsunitygcmwrapper going update plugin builds fcm anyone go...
3 1734 Amazon SNS otp message delivery issue facing issue otp message delivery production environment mix behavior message getting delivered message delivered user ...
4 1733 Amazon SNS simulating message persistence using sqs evaluating messaging requirements integrate multiple applications single producer publishes messages mult...

Top 30 words + frequency of each:

In [18]:
pd.Series(' '.join(df['description']).split()).value_counts()[:30]
Out[18]:
message          2498
topic            1754
endpoint         1500
using            1439
sms              1247
send             1246
error            1223
notification     1200
push             1088
request          1060
messages         1041
get              1022
code             1000
subscription      999
use               971
notifications     885
service           876
new               846
app               834
application       807
publish           740
like              732
issue             731
one               691
email             688
console           664
need              625
token             610
device            605
create            599
dtype: int64
In [19]:
print("There are totally", df['description'].apply(lambda x: len(x.split(' '))).sum(), "words after cleaning.")
There are totally 168416 words after cleaning.

(C) Write to CleanText.csv

In [20]:
with open('C:\\Users\\Aruna\\Documents\\ACMS-IID\\input\\CleanText.csv', 'a', encoding='utf-8', newline='') as csvfile:
    writer = csv.writer(csvfile)
    # writer.writerow(['id', 'label', 'description'])
    for i in range(0, len(df['description'])):
        if len(df['description'][i]) > 1:
            writer.writerow([df['id'][i], df['label'][i], df['description'][i]])

(D) Generate the word cloud

In [21]:
msgs = " ".join(str(msg) for msg in df['description'])
fig, ax = plt.subplots(1, 1, figsize  = (100,100))
wordcloud = WordCloud(max_font_size = 20, max_words = 20, background_color = "white").generate(msgs)
ax.imshow(wordcloud, interpolation='bilinear')
ax.axis('off')
Out[21]:
(-0.5, 399.5, 199.5, -0.5)
In [22]:
msgs = " ".join(str(msg) for msg in df['description'])
fig, ax = plt.subplots(1, 1, figsize  = (100,100))
wordcloud = WordCloud(max_font_size = 20, max_words = 50, background_color = "white").generate(msgs)
ax.imshow(wordcloud, interpolation='bilinear')
ax.axis('off')
Out[22]:
(-0.5, 399.5, 199.5, -0.5)
In [23]:
msgs = " ".join(str(msg) for msg in df['description'])
fig, ax = plt.subplots(1, 1, figsize  = (100,100))
wordcloud = WordCloud(max_font_size = 20, max_words = 100, background_color = "white").generate(msgs)
ax.imshow(wordcloud, interpolation='bilinear')
ax.axis('off')
Out[23]:
(-0.5, 399.5, 199.5, -0.5)
In [24]:
msgs = " ".join(str(msg) for msg in df['description'])
fig, ax = plt.subplots(1, 1, figsize  = (100,100))
wordcloud = WordCloud(max_font_size = 20, max_words = 500, background_color = "white").generate(msgs)
ax.imshow(wordcloud, interpolation='bilinear')
ax.axis('off')
Out[24]:
(-0.5, 399.5, 199.5, -0.5)
In [25]:
msgs = " ".join(str(msg) for msg in df['description'])
fig, ax = plt.subplots(1, 1, figsize  = (100,100))
wordcloud = WordCloud(max_font_size = 20, max_words = 1000, background_color = "white").generate(msgs)
ax.imshow(wordcloud, interpolation='bilinear')
ax.axis('off')
Out[25]:
(-0.5, 399.5, 199.5, -0.5)
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]: